Add RL token throughput and packing metrics#3877
Conversation
Co-authored-by: Jorge Albericio <jalbericiola@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| Returns: | ||
| Total compute tokens (num_bins * bin_size) on this rank. | ||
| """ | ||
| if packing_context is None or packing_context.packed_trajs is None: |
There was a problem hiding this comment.
Your typing says that PackingContext cannot be None
|
/claude review |
megatron/training/training.py
Outdated
|
|
||
| # Add tokens/sec to log string | ||
| log_string += f' toks/s: {tokens_per_sec:.0f} |' | ||
| log_string += f' toks/s/gpu: {tokens_per_sec_per_gpu:.0f} |' |
There was a problem hiding this comment.
compute_tokens is assigned here but never used. Was this intended for something (e.g., a log line or the packing_efficiency calculation)? If not, it should be removed to avoid confusion.
| log_string += f' toks/s/gpu: {tokens_per_sec_per_gpu:.0f} |' | |
| actual_tokens = rl_utils.get_packing_actual_tokens(runtime_state.packing_context) |
megatron/training/training.py
Outdated
| packing_efficiency = rl_utils.get_packing_efficiency(runtime_state.packing_context) | ||
|
|
||
| # Add tokens/sec to log string | ||
| log_string += f' toks/s: {tokens_per_sec:.0f} |' |
There was a problem hiding this comment.
Is this going to add this metric to the log for all training? I'm not sure we use this metric a lot in pretraining, so nervous it might just be adding noise to the log.
There was a problem hiding this comment.
I've moved all the extra metrics in training.py into a single if-block guarded by args.perform_rl_step; does that look good?
b215575 to
2b2a0d3
Compare
| self.sequences_this_iteration_on_rank = 0 | ||
| self.latest_batch_num_sequences = 0 | ||
| # Derived throughput metrics (set by training_log, read by RLProfiler) | ||
| self.tokens_per_sec = None |
There was a problem hiding this comment.
Please, add the field descriptions here.
| self.tokens_per_sec = None | ||
| self.tokens_per_sec_per_gpu = None | ||
| self.actual_tokens_per_sec = None | ||
| self.actual_tokens_per_sec_per_gpu = None |
There was a problem hiding this comment.
Do we actually need _per_gpu variables here? How about we store tokens/actual_tokens and world_size and have a method that does the actual division?
| log_string += ' number of nan iterations: {:3d} |'.format(total_loss_dict[nan_iters_key]) | ||
|
|
||
| # RL token throughput metrics. | ||
| if args.perform_rl_step: |
There was a problem hiding this comment.
Should we move this to a function in the RL folder? training.py becomes unreadable.
What does this PR do ?
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.